804 research outputs found

    A Template for Implementing Fast Lock-free Trees Using HTM

    Full text link
    Algorithms that use hardware transactional memory (HTM) must provide a software-only fallback path to guarantee progress. The design of the fallback path can have a profound impact on performance. If the fallback path is allowed to run concurrently with hardware transactions, then hardware transactions must be instrumented, adding significant overhead. Otherwise, hardware transactions must wait for any processes on the fallback path, causing concurrency bottlenecks, or move to the fallback path. We introduce an approach that combines the best of both worlds. The key idea is to use three execution paths: an HTM fast path, an HTM middle path, and a software fallback path, such that the middle path can run concurrently with each of the other two. The fast path and fallback path do not run concurrently, so the fast path incurs no instrumentation overhead. Furthermore, fast path transactions can move to the middle path instead of waiting or moving to the software path. We demonstrate our approach by producing an accelerated version of the tree update template of Brown et al., which can be used to implement fast lock-free data structures based on down-trees. We used the accelerated template to implement two lock-free trees: a binary search tree (BST), and an (a,b)-tree (a generalization of a B-tree). Experiments show that, with 72 concurrent processes, our accelerated (a,b)-tree performs between 4.0x and 4.2x as many operations per second as an implementation obtained using the original tree update template

    Tuning the Level of Concurrency in Software Transactional Memory: An Overview of Recent Analytical, Machine Learning and Mixed Approaches

    Get PDF
    Synchronization transparency offered by Software Transactional Memory (STM) must not come at the expense of run-time efficiency, thus demanding from the STM-designer the inclusion of mechanisms properly oriented to performance and other quality indexes. Particularly, one core issue to cope with in STM is related to exploiting parallelism while also avoiding thrashing phenomena due to excessive transaction rollbacks, caused by excessively high levels of contention on logical resources, namely concurrently accessed data portions. A means to address run-time efficiency consists in dynamically determining the best-suited level of concurrency (number of threads) to be employed for running the application (or specific application phases) on top of the STM layer. For too low levels of concurrency, parallelism can be hampered. Conversely, over-dimensioning the concurrency level may give rise to the aforementioned thrashing phenomena caused by excessive data contention—an aspect which has reflections also on the side of reduced energy-efficiency. In this chapter we overview a set of recent techniques aimed at building “application-specific” performance models that can be exploited to dynamically tune the level of concurrency to the best-suited value. Although they share some base concepts while modeling the system performance vs the degree of concurrency, these techniques rely on disparate methods, such as machine learning or analytic methods (or combinations of the two), and achieve different tradeoffs in terms of the relation between the precision of the performance model and the latency for model instantiation. Implications of the different tradeoffs in real-life scenarios are also discussed

    Progressive Transactional Memory in Time and Space

    Full text link
    Transactional memory (TM) allows concurrent processes to organize sequences of operations on shared \emph{data items} into atomic transactions. A transaction may commit, in which case it appears to have executed sequentially or it may \emph{abort}, in which case no data item is updated. The TM programming paradigm emerged as an alternative to conventional fine-grained locking techniques, offering ease of programming and compositionality. Though typically themselves implemented using locks, TMs hide the inherent issues of lock-based synchronization behind a nice transactional programming interface. In this paper, we explore inherent time and space complexity of lock-based TMs, with a focus of the most popular class of \emph{progressive} lock-based TMs. We derive that a progressive TM might enforce a read-only transaction to perform a quadratic (in the number of the data items it reads) number of steps and access a linear number of distinct memory locations, closing the question of inherent cost of \emph{read validation} in TMs. We then show that the total number of \emph{remote memory references} (RMRs) that take place in an execution of a progressive TM in which nn concurrent processes perform transactions on a single data item might reach Ω(nlogn)\Omega(n \log n), which appears to be the first RMR complexity lower bound for transactional memory.Comment: Model of Transactional Memory identical with arXiv:1407.6876, arXiv:1502.0272

    LNCS

    Get PDF
    Concurrent accesses to shared data structures must be synchronized to avoid data races. Coarse-grained synchronization, which locks the entire data structure, is easy to implement but does not scale. Fine-grained synchronization can scale well, but can be hard to reason about. Hand-over-hand locking, in which operations are pipelined as they traverse the data structure, combines fine-grained synchronization with ease of use. However, the traditional implementation suffers from inherent overheads. This paper introduces snapshot-based synchronization (SBS), a novel hand-over-hand locking mechanism. SBS decouples the synchronization state from the data, significantly improving cache utilization. Further, it relies on guarantees provided by pipelining to minimize synchronization that requires cross-thread communication. Snapshot-based synchronization thus scales much better than traditional hand-over-hand locking, while maintaining the same ease of use

    Molecular basis for passive immunotherapy of Alzheimer's disease

    Get PDF
    Amyloid aggregates of the amyloid-{beta} (A{beta}) peptide are implicated in the pathology of Alzheimer's disease. Anti-A{beta} monoclonal antibodies (mAbs) have been shown to reduce amyloid plaques in vitro and in animal studies. Consequently, passive immunization is being considered for treating Alzheimer's, and anti-A{beta} mAbs are now in phase II trials. We report the isolation of two mAbs (PFA1 and PFA2) that recognize A{beta} monomers, protofibrils, and fibrils and the structures of their antigen binding fragments (Fabs) in complex with the A{beta}(1–8) peptide DAEFRHDS. The immunodominant EFRHD sequence forms salt bridges, hydrogen bonds, and hydrophobic contacts, including interactions with a striking WWDDD motif of the antigen binding fragments. We also show that a similar sequence (AKFRHD) derived from the human protein GRIP1 is able to cross-react with both PFA1 and PFA2 and, when cocrystallized with PFA1, binds in an identical conformation to A{beta}(1–8). Because such cross-reactivity has implications for potential side effects of immunotherapy, our structures provide a template for designing derivative mAbs that target A{beta} with improved specificity and higher affinity

    Learning an atlas of a cognitive process in its functional geometry

    Get PDF
    Proceedings of the 22nd International Conference, IPMI 2011, Kloster Irsee, Germany, July 3-8, 2011.In this paper we construct an atlas that captures functional characteristics of a cognitive process from a population of individuals. The functional connectivity is encoded in a low-dimensional embedding space derived from a diffusion process on a graph that represents correlations of fMRI time courses. The atlas is represented by a common prior distribution for the embedded fMRI signals of all subjects. The atlas is not directly coupled to the anatomical space, and can represent functional networks that are variable in their spatial distribution. We derive an algorithm for fitting this generative model to the observed data in a population. Our results in a language fMRI study demonstrate that the method identifies coherent and functionally equivalent regions across subjects.National Science Foundation (U.S.) (IIS/CRCNS 0904625)National Science Foundation (U.S.) (CAREER grant 0642971)National Institutes of Health (U.S.) (NCRR NAC P41- RR13218)National Institute of Biomedical Imaging and Bioengineering (U.S.) (U54-EB005149)National Institutes of Health (U.S.) (U41RR019703)National Institutes of Health (U.S.) (P01CA067165)Seventh Framework Programme (European Commission) (n◦257528 (KHRESMOI)

    Pessimistic Software Lock-Elision

    Get PDF
    Read-write locks are one of the most prevalent lock forms in concurrent applications because they allow read accesses to locked code to proceed in parallel. However, they do not offer any parallelism between reads and writes. This paper introduces pessimistic lock-elision (PLE), a new approach for non-speculatively replacing read-write locks with pessimistic (i.e. non-aborting) software transactional code that allows read-write concurrency even for contended code and even if the code includes system calls. On systems with hardware transactional support, PLE will allow failed transactions, or ones that contain system calls, to preserve read-write concurrency. Our PLE algorithm is based on a novel encounter-order design of a fully pessimistic STM system that in a variety of benchmarks spanning from counters to trees, even when up to 40% of calls are mutating the locked structure, provides up to 5 times the performance of a state-of-the-art read-write lock.National Science Foundation (U.S.) (Grant 1217921
    corecore